Metabolomics facilitates differential diagnosis in common inherited retinal degenerations by exploring their profiles of serum metabolites

The diagnosis of inherited retinal degeneration (IRD) is challenging owing to its phenotypic and genotypic complexity. Clinical information is important before a genetic diagnosis is made. Metabolomics studies the entire picture of bioproducts, which are determined using genetic codes and biological reactions. We demonstrated that the common diagnoses of IRD, including retinitis pigmentosa (RP), cone-rod dystrophy (CRD), Stargardt disease (STGD), and Bietti’s crystalline dystrophy (BCD), could be differentiated based on their metabolite heatmaps. Hundreds of metabolites were identified in the volcano plot compared with that of the control group in every IRD except BCD, considered as potential diagnosing markers. The phenotypes of CRD and STGD overlapped but could be differentiated by their metabolomic features with the assistance of a machine learning model with 100% accuracy. Moreover, EYS-, USH2A-associated, and other RP, sharing considerable similar characteristics in clinical findings, could also be diagnosed using the machine learning model with 85.7% accuracy. Further study would be needed to validate the results in an external dataset. By incorporating mass spectrometry and machine learning, a metabolomics-based diagnostic workflow for the clinical and molecular diagnoses of IRD was proposed in our study.


Serum metabolites extraction
The MTBE extraction protocol, with laboratory modifications, was used to extract lipids and polar metabolites from the serum.The 10 μL internal standards (IS) mixture

LC-MS analysis
Untargeted metabolomic analyses rely on ultra-performance liquid chromatography-tandem mass spectrometry (UPLC-MS/MS).The platform utilized a Thermo Scientific UltiMate 3000 UHPLC system and Q Exactive Plus Hybrid Quadrupole-Orbitrap MS interfaced with a heated electrospray ionization probe.A 5 μL volume of upper (hydrophobic) and lower (hydrophilic) portion serum extract was separated using a C18 (Waters UPLC CSH C18: 2.1 x 100 mm, 1.7 μm) and a hydrophilic interaction liquid chromatography (HILIC) column (Waters UPLC BEH Amide: 2.1 x 150 mm, 1.7 μm), respectively.The MS scan range covered 150-1,500 and 70-1,000 m/z for the hydrophobic and hydrophilic portion analyses, respectively.Each portion was analyzed under positive and negative ion modes; the spray voltage is 3.5 and -3.5 kV, respectively.The MS operated at 180 °C heater temperature, 280 °C capillary temperature, 35 arb sheath gas flow, and 15 arb auxiliary gas flow.The Orbitrap mass analyzer was set at 35,000 mass resolution.
The injection order of the samples has been randomly arranged for the LC-MS analysis, and pooled quality control samples were injected at the beginning, every 10 injection, and at the end of each batch to ensure spectral quality.
The upper portion of the extract containing the relatively hydrophobic metabolites was gradient-eluted using IPA, ACN, and water (A: ACN/H2O, v/v = 4/6; B:

Data preprocessing
UPLC-MS/MS data were processed using Thermo Scientific Compound Discoverer v3.2 software.Based on the cubic-line retention time alignment algorithm, the retention times were aligned to the references selected using the software.Next, only features with intensities above 1,000,000 and 500,000 for the positive and negative datasets, respectively, were included for further analysis.M+H, M+Na, M+K, M+H-H2O, M+H-NH3, M+2H, M-H, M-H-H2O, and M-2H were considered possible adduct ions, and each ion was assigned to a predicted chemical composition according to its exact mass (< 5 ppm) and isotopic pattern (intensity tolerance < 30%).Different adduct ions from the same compound were grouped, and only metabolites that appeared in at least half of the samples in any IRD subtype or healthy controls were preserved.
Undetectable metabolites in the samples were filled using background noise.The signal of each metabolite was normalized to the corresponding IS for each dataset.The references for hydrophobic-positive, hydrophobic-negative, and hydrophilic-(both positive and negative) were 15:0-18:1-d7-PC, 15:0-18:1-d7-PG, and L-tryptophan-(indole-d5), respectively.Finally, the peak intensity lower than three times blank samples were excluded, and 1,606 compounds were received after the above processing.

Statistical analysis
Next, multivariate and univariate statistical analyses were conducted using the MetaboAnalyst 5.0 online platform. 41The heatmap was constructed based on the auto-scale features standardization, Pearson distance measurement, and Ward clustering method.Partial least squares-discriminant analysis was performed on normally distributed data.The significant features in the volcano plot were defined as a false discovery rate < 0.05 (Benjamini-Hochberg test) and fold change > 2.

Machine learning and diagnostic model construction
The least absolute shrinkage and selection operator (LASSO) algorithm was applied to establish two diagnostic models: one for CD/CRD, STGD, and normal participant classification, and the other for RP patients' genotype prediction.The RP genotyping model discriminated between USH2A, EYS, and other samples, including ABCA4 and PRPF31 mutants.In this study, RapidMiner Studio (version 9.10.001), a commercially available data mining software, was used to construct the machine learning models.To implement the LASSO model, the built-in generalized linear model was used and the alpha parameter was set to 1, indicating the use of an L1 penalty.
The lambda parameter, controlling the degree of regularization, was determined using the "lambda search" function within RapidMiner.The optimization process programmed to terminate when the relative improvement fell below 0.001.The MS data were randomly separated into training and validation sets with a ratio of 7 to 3 for each subtype.The LASSO models were trained using the training dataset, and evaluated using leave-one-out cross-validation.After training, the model was further evaluated using the validation set, and the area under the ROC curve was calculated for the training and validation dataset, respectively.
contains 15:0-18:1-d7-PC (2 ppm), 15:0-18:1-d7-PG (2 ppm), and L-tryptophan-(indole-d5) (10 ppm) were spiked into an aliquot of 50 μL serum.Then the sample was extracted by adding 600 μL MTBE and 150 μL MeOH and vortexed for 30 min at room temperature.Next, the sample was added with 200 μL water and centrifuged for 3 min at 13,697 g for phase separation.The upper portion containing serum lipids was transferred to another tube.The extraction was repeated by adding 100 μL water, 100 μL MeOH, and 300 μL MTBE.The sample was vortexed for an additional 10 min and centrifuged for 3 min at 13,697 g.The upper portion was mixed with the lower portion, and the combined solution was dried in a vacuum concentrator (Vacufuge plus Vacuum Concentrator, Eppendorf) for 3 h.The sample reconstitution was performed by adding 100 μL of reconstituted solution (ACN/IPA/water, v/v/v = 65/30/5).For the lower portion, 150 μL cold MeOH was added and stored under a -20 o C environment for 2 h, followed by 10 min of 21,401 g centrifugation for protein precipitation.Next, the supernatant was dried using a vacuum concentrator overnight and then reconstituted by adding 100 μL reconstituted solution (ACN/water, v/v = 50/50).The protein precipitation was repeated by mixing 60 μL reconstituted sample with 120 μL cold ACN and then putting the mixtures in a -20 o C freezer for an hour.After a 15 min centrifugation at 21,401 g under 4 o C, super supernatants (120 μL) were collected and stored under -80 o C before further analysis.

Table 2 .
Number of significant features in each comparison, which were used to quantify the metabolic differences between each IRD subtype and healthy controls # In all IRD groups, p=0.709.SupplementarySignificant features are defined as a false discovery rate < 0.05 (Benjamini-Hochberg test, two-sided) and fold change > 2.BCD, Bietti's crystalline dystrophy; CD/CRD, cone dystrophy/cone-rod dystrophy; IRD,